Despite recent progress on trajectory planning of multiple robots and path planning of a single tethered robot, planning of multiple tethered robots to reach their individual targets without entanglements remains a challenging problem. In this paper, we present a complete approach to address this problem. Firstly, we propose a multi-robot tether-aware representation of homotopy, using which we can efficiently evaluate the feasibility and safety of a potential path in terms of (1) the cable length required to reach a target following the path, and (2) the risk of entanglements with the cables of other robots. Then, the proposed representation is applied in a decentralized and online planning framework that includes a graph-based kinodynamic trajectory finder and an optimization-based trajectory refinement, to generate entanglement-free, collision-free and dynamically feasible trajectories. The efficiency of the proposed homotopy representation is compared against existing single and multiple tethered robot planning approaches. Simulations with up to 8 UAVs show the effectiveness of the approach in entanglement prevention and its real-time capabilities. Flight experiments using 3 tethered UAVs verify the practicality of the presented approach.
translated by 谷歌翻译
While feature association to a global map has significant benefits, to keep the computations from growing exponentially, most lidar-based odometry and mapping methods opt to associate features with local maps at one voxel scale. Taking advantage of the fact that surfels (surface elements) at different voxel scales can be organized in a tree-like structure, we propose an octree-based global map of multi-scale surfels that can be updated incrementally. This alleviates the need for recalculating, for example, a k-d tree of the whole map repeatedly. The system can also take input from a single or a number of sensors, reinforcing the robustness in degenerate cases. We also propose a point-to-surfel (PTS) association scheme, continuous-time optimization on PTS and IMU preintegration factors, along with loop closure and bundle adjustment, making a complete framework for Lidar-Inertial continuous-time odometry and mapping. Experiments on public and in-house datasets demonstrate the advantages of our system compared to other state-of-the-art methods. To benefit the community, we release the source code and dataset at https://github.com/brytsknguyen/slict.
translated by 谷歌翻译
作为人类识别的重要生物标志物,可以通过被动传感器在没有主题合作的情况下以远距离收集人步态,这在预防犯罪,安全检测和其他人类识别应用中起着至关重要的作用。目前,大多数研究工作都是基于相机和计算机视觉技术来执行步态识别的。但是,在面对不良的照明时,基于视觉的方法并不可靠,导致性能降解。在本文中,我们提出了一种新型的多模式步态识别方法,即gaitfi,该方法利用WiFi信号和视频进行人类识别。在GAITFI中,收集了反映WiFi多路径传播的通道状态信息(CSI),以捕获人体步态,而视频则由相机捕获。为了了解强大的步态信息,我们建议使用轻量级残留卷积网络(LRCN)作为骨干网络,并通过集成WiFi和Vision功能来进一步提出两流性gaitfi,以进行步态检索任务。通过在不同级别的特征上的三胞胎损失和分类损失进行训练。广泛的实验是在现实世界中进行的,该实验表明,基于单个WiFi或摄像机的GAITFI优于最先进的步态识别方法,对于12个受试者的人类识别任务而达到94.2%。
translated by 谷歌翻译
尽管数十年来,同时定位和映射(SLAM)一直是一个积极的研究主题,但由于特征不足或其固有的估计漂移,在许多平民环境中,当前的最新方法仍然遭受不稳定或不准确性的困扰。为了解决这些问题,我们提出了一个梳理SLAM和先前基于图的本地化的导航系统。具体而言,我们考虑了线条和平面特征的其他集成,这些特征在平民环境中无处不在,在结构上更突出,以确保功能充足和本地化的鲁棒性。更重要的是,我们将一般的先验地图信息纳入SLAM以限制其漂移并提高准确性。为了避免在先前的信息和局部观察之间进行严格的关联,我们将先验知识的参数化为低维结构先验,定义为不同几何原始原始人之间的相对距离/角度。本地化被公式化为基于图的优化问题,其中包含基于滑动窗口的变量和因素,包括IMU,异质特征和结构先验。我们还得出了不同因素的雅各布人的分析表达式,以避免自动分化开销。为了进一步减轻结合结构先验因素的计算负担,根据所谓的信息增益采用了选择机制,以仅将最有效的结构先验纳入图表优化中。最后,对综合数据,公共数据集以及更重要的是,对所提出的框架进行了广泛的测试。结果表明,所提出的方案可以有效地提高平民应用中自动驾驶机器人的本地化的准确性和鲁棒性。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译